Author: Alexander Moradi

Details of variable construction as well as sources can be found in the Web Data-Appendix of the paper. Several primary sources are proprietary.

For raster data we used QGIS' <Zonal statistics> tool to calculated the mean or sum of the pixel that fell within the grids.
For vector data, we used QGIS' <Join attributes by nearest> and <Distance to nearest hub> tool.

We did this for each shapefile and merged the individual output (.dbf) files in Stata. The master do file is <compile_grid_data.do>. There are several sub do files that helped to keep an overview.

Do files
1) <compile_grid_data.do> (Master do file)
2) <compile_grid_mines.do> (sub do file Mining location and data of opening)
3) <compile_grid_missions.do> (sub do file location of Roome (Nunn) and Beach (Cage and Rueda))
4) <compile_grid_pop.do (sub do file processing the population data from HYDE 3.1)
5) <calculate_export_values.do> (sub do file that calculates the value of production based on export statistics)

Input files
1) grid_africa.shp (contains the grid for Africa)
2a) grid_nearSNL.xls  (location of mines according to SNL)
2b) grid_nearUSGS.xls (location of mines according to USGS)
2c) grid_mining.dta (number of mines existing 1900, 1914 & 1924. Source: SNL &  Remi based on USGS)
3a) cr_grid.xls (location of Beach missions)
3b) roome_grid.xls (location of Roome missions, available from Nunn)
3c) grid_missions.dta (Beach/Roome missions in grids, available from Cage and Rueda)
4a) pop1400.xls, pop1500.xls, pop1600.xls, pop1700.xls, pop1800.xls, pop1900.xls, grid_pop.dta (population data from Hyde 3.1)
4b) grid_popurbrur.dta (urban and rural population 1900 from from HYDE 3.0)
4c) cities1901_v7.xls, grid_pop2000.dta (urban population 1900-2010 from Census data)
4d) Chandler_1400_1800_africa.xls (population data 1400-1800 from Chandler)
5) grid_muslim.dta (distance to a muslim seat/ centre, distance to muslim belt, in muslim belt)
6) grid_murdock.dta (Ethnic group level data for polygamy monogamy slavery polity missing_murdock v99)
7) grid_malaria.dta (mean of sickle cell map, available from Depetris Chavin, 2018)
8) grid_tsetse.dta (mean of tsetse from Alsan, 2015)
9) grid_climate.dta (Rainfall and Temperature 1900-1929 from CPRU and GPCC respectively - For 2.5° grid resolution: Schneider, Udo; Becker, Andreas; Finger, Peter; Meyer-Christoffer, Anja; Rudolf, Bruno; Ziese, Markus (2011): GPCC Full Data Reanalysis Version 6.0 at 2.5°: Monthly Land-Surface Precipitation from Rain-Gauges built on GTS-based and Historic Data. DOI: 10.5676/DWD_GPCC/FD_M_V7_250)
10) grid_altitude.dta (Altitude based on 250mx250m raster resolution)
11) grid_dist2coast.dta (Distance to coast)
12a) grid_crops.dta (crop suitatability from GAEZ, index 0-10000)
12b) Cash_crops_africa_1900_1924.xls (export commodity values from Blue Books)
13) grid_dist2rail.dta (Distance to railroad)
14) grid_dist2placeborail.dta (Distance to placebo routes)
15) grid_dist2lake.dta (Distance to lakes in km)
16) grid_dist2river.dta (Distance to navigable river in km)
17) grid_dist2explorer.dta (Distance to Nunn's explorer routes in km)
18) grid_gridcell2.dta (Gridcell ID aggregating 0.1x01 cells into 0.2x0.2 cells)
19) grid_soil.dta (Soil suitable for cultivation from "Map 6.65: Combined suitability of currently available land for pasture and rainfed crops (low input level)" includes land well-suited for rainfed crops & prime land for rainfed crops + irrigaged areas
19a) grid_soil5.dta (Soil fertility index Africa map %afferl, 5 categories (0-20, 20-40, 40-60, 60-80, 80-100))
19b) grid_area.dta (Share of land that is not covered by water (lake/ocean))
20) col_date: Date of colonisation of ethnic group from Henderson & Whatley - based on Thomas Pakenham's "The Scramble for Africa." 


Output files
1) grid_africa_01182018

Sources: See Web Data-Appendix of the paper
